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WAVELET THRESHOLDING FOR NONNECESSARILY 
GAUSSIAN NOISE: FUNCTIONALITY 

By R. Averkamp^ and C. Houdre^ 

Freiburg University, and Universite Paris XII and 
Georgia Institute of Technology 

For signals belonging to balls in smoothness classes and noise with 
enough moments, the asymptotic behavior of the minimax quadratic 
risk among soft-threshold estimates is investigated. In turn, these re- 
sults, combined with a median filtering method, lead to asymptotics 
for denoising heavy tails via wavelet thresholding. Some further com- 
parisons of wavelet thresholding and of kernel estimators are also 
briefly discussed. 

1. Introduction. The model considered throughout these notes is the 
familiar one. The data takes the form 

(1.1) X, = fi + ^, i = l,...,n,n = 2\/iGN, 



n 



where f = (fi) is the signal to estimate and where the noise e = (cj) is 
such that the are zero mean i.i.d. random variables. One thinks of fi as 
fi = fn.i = f{i/n)/^/n, so it is assumed that the data is sampled from a 
signal at the rate 1/n and then multiplied by 1/y/n. Applying a discrete 
wavelet transform (associated to an orthonormal wavelet basis, adapted to 
an interval and generated by a compactly supported wavelet) to the data 
leads to the noisy wavelet coefficients 

(1.2) Wk = 9k + zk; k = l,...,2^", 
and 

(1.3) Wj^k = 0j,k + Zj^k] jo<j<h-l,k = l,...,2^, 
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where to simphfy notation the dependence on n has been omitted (in partic- 
ular, a factor \l\fn is omitted). Thresholding is then applied to the trans- 
formed data and the signal is recovered by applying an inverse transforma- 
tion to the thresholded data [8] . In contrast to the ideal framework [5] , in the 
functional framework the performance of estimators is no longer compared 
to a benchmark but instead the possible values of (^.,.) are restricted to 
belonging to a ball in a smoothness class. To be more precise, it is assumed 
that 

(1.4) 

for some constant A, where s:=m + l/2 — 1/p and m > 1/p. The condi- 
tion m> 1/p ensures that we deal with well-defined real- valued functions 
(and not generalized ones) in the Besov space [Recall that m is the 

degree of smoothness of the function whose modulus of smoothness is lo- 
cally quantified via (norms involving) the parameters p and q.] Next, if the 
{0.^.) are the wavelet coefficients of / € and if the wavelet basis is suf- 
ficiently smooth, then ||/||_Bm^ < CiA, where Ci = Ci{m,p,q) is a constant 
and l<p,q< +oo. Also, considering quasi- norms rather than norms, sim- 
ilar results hold in the cases 0<p<l, or < q <1 (we refer the reader to 
[3] and [15] for a much more extensive and precise list of references and fur- 
ther information on wavelets (and functions spaces)). We also note here that 
the Besov assumption can be replaced by a Triebel-Lizorkin one throughout 
much of the paper. Indeed, it is well known that the equivalence between the 
sequence space (quasi-)norm and the function space (quasi-)norm is what 
matters here. In view of this equivalence, we will slightly abuse notation and 
use II • ||_B™^ for the norm on the sequence space. 

In this framework, Donoho, Hall, Johnstone, Kerkyacharian, Picard, Sil- 
verman and Yu compute minimax bounds of estimation [6, 7, 8, 9, 10, 11, 12], 
and show the corresponding optimality of wavelet thresholding. In partic- 
ular, if the Cj (hence, the Zi) are i.i.d. normal random variables, then the 
minimax rate in this model is 7T,-2m/(2m+i) ^ that is, 

(1.5) inf sup £;||^-0||2~Cn-2"^/(2m+i)^ 

where the infimum is taken over all estimators and where C is a positive 
constant which depends on the variance of the noise, as well as on m, p, q and 
A. [Throughout these notes, || • ||2 is the Euclidean norm. From Parseval's 
identity and the equivalence between sequence and functional spaces, we 
thus see that (1.5) has an equivalent formulation at the function space level.] 
Moreover, estimators based on soft thresholding achieve this rate. 

These early results were then extended to some classes of non-Gaussian 
noise by Neumann and Spokoiny [16] and Delyon and Juditsky [4]. It is 
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shown in [16] that, for noise having finite moments of all orders (and L?'- 
differentiable density) , soft thresholding achieves the same rate as soft thresh- 
olding for Gaussian noise. Furthermore, the actual performance, not just the 
rate, is the same. In [4], it is shown that, for more general distributions, soft 
thresholding can achieve the same rate as soft thresholding in the Gaus- 
sian case. In addition, under somewhat stronger conditions, the ratio of the 
minimax risk for Gaussian noise and other types of noise tends to one [16]. 

It is our purpose to further explore these topics here. Let us briefly dis- 
cuss the contribution of the present paper. First, in Section 2 we show that 
if the noise only fulfills some moment conditions, soft thresholding actu- 
ally achieves the same asymptotic performance as soft thresholding in the 
Gaussian case. In fact, it is shown that for soft thresholding the liminf of 
the ratio of the minimax risk for Gaussian noise and this type of noise is 
larger than one. These results are then used, in Section 3, to tackle the es- 
timation problem for noise with heavy tails. By first median filtering the 
data, the previous moment conditions become satisfied and then applying 
wavelet thresholding, it is still possible to have the same minimax rate as in 
the Gaussian case. To complete our study of wavelet thresholding methods, 
we return to the normal framework and present some concluding remarks 
comparing thresholding and kernel estimators with varying bandwidth. 

2. Moment conditions. Our first statement is the core result of this sec- 
tion. To prove it, a fair amount of technical preparation is needed and the 
main part of the proof is postponed to the Appendix. However, we state and 
prove below some preparatory lemmas and indicate their use in the proof of 
the theorem. 

In the sequel $ denotes the standard normal distribution function, and 
is expectation with respect to <I>. Using the notation of [1], for any 
A > 0, denotes the soft thresholding operator given by T^{x) = (|x| — 
A)"^sgn(x),x G M. Also, throughout the section the wavelet transform is as 
in [1], Section 4; in particular, the wavelet is assumed to be Holder continuous 
of index /? > 0. 

Theorem 2.1. Let the model he given via (1.1)-(1.4), where p,q> 1 
and m> 1/p and where the et have variance one. Let also the e,, have finite 
moments of order L, where L is such that 



(2.1) 



L> 



6 



ifp>2 



2m/ {2m + I) 



and 



(2.2) 



L> 



6(m+l/2-l/p)(2m + l) 



ifl<P<2. 



{m + 1/2 - l/p){2m + l) -m 
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Moreover, let the be symmetric. Then 

(2.3) liminf , ^ — > 1. 

Above, the requirement of symmetry is imposed for technical reasons 
(we preserve the zero mean property of the wavelet transform of truncated 
noise). This requirement can be circumvented by more technical efforts in 
the proof. The i.i.d. assumption on the noise e is not really needed either. 
Independence and supji?|ei|^ < +oo, where L satisfies either (2.1) or (2.2), 
will do, with also a variance level of < cj^ = supiEef < +oo. 

Let us illustrate the moment conditions to be satisfied: p > 2 =^ L = 
12; p = 2 =^ L > 12; p = 2,m ^ +oo =^L>6; p = 1 ^ L > 18; p=l,m ^ 
+00 ^L>6;p = 3/2,m = l^L>10;p = 3/2,m = 2^L> 7.7. Note that 
L > 6 is the least moment condition imposed above. 

First, a well-known lemma whose proof is omitted. 



Lemma 2.2. Let s := m + 1/2 - l/p and let 

\ fc / \j>ja \ \ k / / 

for some ^ > 0. Then for all l> jo, 



< A 



for any a such that 2' > = 2"'* 



A'^{2~^"^y/{1 - 2-2"^) = ©(n"^"'"), ifp>2, 
^2(2-2^)7(1 - 2-2s) = 0(n-2"^), ifl<p<2, 



As indicated in the Appendix, the previous lemma shows that, if we want 
to achieve the same minimax rate as in the Gaussian case, we need not 
worry about the (finer wavelet) coefficients in the levels j >l = ah, as long 
as a > l/(2m + 1), if p > 2, and a > m/((2m + l)s) if 1 < p < 2. Indeed, 
the square of the ^^.^orm of these coefficients is of order o(n-2'"/(2"i+i)^_ 
For p>2, let I be such that 271^/(2""+^) > 2' > n^/(^^+^). Then the simple 
estimator which discards the noisy coefficients of indices I and above (keeping 
them otherwise) achieves the minimax rate since 

^ = 0(n-2-/(2m+l)) ^ j^^2^^ ^ Q(^-2m/(2m+l))_ 

j>l,k j<l,k 

Recall now a classical exponential inequality due to Kolmogorov (see [19], 
page 855). 
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Lemma 2.3. Let Xi, i = l,...,n, be zero mean, independent random 
variables. Let := Yll=i > supj ||^i||oo < L^, and let 5„ = J2?=i ^i- 
Then for all x > 0, 



P{Sn > SnX) < 



exp( ^^fl - ^ ) ), ifx<Sn/K, 



2 V 2Sr, 



The next lemma is a simple application of the previous one. It is used in 
the proof of Theorem 2.1 to upper estimate E{T^^ {^j,k + ^i,fc) — 

j,k ' ' 

%,fc)^l{|2j.fc|>fej,fe}, for appropriately chosen A^-^ and bj^k- 

Lemma 2.4. Let {Xi^n)i,nef^ be zero mean random variables such that, 
for each fixed n, the Xi^n CLfe independent. Let J2i ^^fn — ^ ^'^'^ 
supj ||Xj^„||oo < LCn, where lim„_»oo-f^n = 0. Let Fn be the distribution func- 
tion ofJ2i^i,n, CLnd let (a„) be a sequence of positive reals with a„ = o{\/Kn) 
and such that, for all n G N, kn '■= (1 — anKn/2) > 0. Then, for any a with 
< a < On, 



f 

J a 



oo q2 _j_ 2 

x^Fn{dx) < — — exp(-fcnaV2) +o(exp(-l/K„)). 



Proof. Using Lemma 2.3, we have 

/•oo rco 

/ x^dFn{x)=a^{l-Fn{a)) + 2 x{l-Fn{x))dx 

J a J a 

< exp(— A;„a^/2) + 2 / xexp(— A;„x^/2) 

J a 

POO 

+ 2 xeyip{-x/{AKn))dx 

Jl/K„ 

= exp(— A;„a^/2) + 2/kn exp(— /i;„a^/2) 

rca 

- [8KnXeM-x/{4.Kn))]T/K^ + 8Kn / exp(-x/(4K„)) dx 
= exp(— A;„a^/2) + 2/kn exp(— /c„a^/2) 

< (a2 + 2)/A:„exp(-fc„aV2) + o(exp(-l/K„)). □ 

We further need the following large deviation result, which is a simple 
extension of Lemma 5.8 in [17]; the difference with this lemma is that the 
requirement of identical distributions is dropped. The proof with the help of 
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Esseen's inequality ([17], Theorem 5.4) is essentially the same as for Lemma 
5.8 in [17] (om^ C below is A in [17]). 

This lemma is used to show that, for a large class of noise, and midsize 
thresholds, the soft thresholding risk converges to the Gaussian risk. 

Lemma 2.5. Let {Xi^n)i,n&n be zero mean random variables such that, 
for each fixed n, the Xi^n cLf^ independent. Let ^^EXf^ = 1 and let M„ := 
J2iE\Xi,n\^ < +00. Then for all < e < 1 there exist /?„ with ^ 1 such 
that, for all x with [x] < (1 - e) V21og(l/(CM„)), 

$((_^,^]) ^V/^n ana fJn< ci,((x,+oo)) - 
where C is an absolute constant. 

Remark 2.6. From the end of the proof of Theorem 2.1, we infer that 
the thresholds Xj^k can asymptotically be chosen as in the Gaussian case 
(A = A for A < an/2, while A/A — > 1 for A > an/2). However, in this proof 
the thresholds A are larger than the Gaussian ones (with the same variance) . 
In the Gaussian functional approach, the optimal minimax thresholds are of 
order Cai/^J^^joy^jT/y^, where C and jo depend on m,p and q and where cr^ 
is the variance of the noise (e.g., see [4, 16]). For the ideal estimator approach, 
the optimal minimax rate is achieved with thresholds of uniform size ~ 
a^2\o%n/ yfn, and we also know (see Theorem 6.1 in [1]) that thresholds 
can be chosen levelwise to still produce a minimax method. There, for the 
level j the thresholds were chosen to be of size ~ o^2j log 2/-y/n. Now, using 
thresholds of size Cay/j / y/n for the level j in the function space approach 
almost achieves the ideal minimax rate; it is only worse by a factor O(logn). 
This discrepancy cannot be avoided in general and, at least for p > 2, no 
set of thresholds will achieve the optimal minimax rate in both contexts. 
Indeed, let = /j + e^, i = 1, . . . ,n, where the are i.i.d. normal random 
variables with mean zero and variance 1/n, and let p$(-, •) be defined as in 
the Appendix (or as Theorem 2.1 in [1]). If A > ^ > 0, then 

j—\-e q2 

(2.5) p^{\,e)>e'^m-\-9,\-e))+ {x + \f^{dx)>—. 

J —oo ^ 

Let A„j (n is for the number of coefficients, while j is a particular level) 
be a set of thresholds which achieve the optimal minimax rate in the ideal 
estimator context. For a fixed a G (0, 1), the optimal thresholds for the level 
2 = alog2ra have to be at least of size ~ -v/2jlog2/y^ = C^fj/ ^fn, where 
C is a constant. The reason is that 2-^p$(Aj,0) = 0(logn/n) is needed to 
achieve the minimax rate for the ideal estimator approach. Let now 



Jo := mini j : > 2AV2-^(^^+^) } . 
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Simple computations yield that jo ~ (log2 n) / (2m + 1). If %o,fc ~ AV2 io(2m-i-i)^ 
k = 0, . . . , 2^" — 1, and Oj f^ = elsewhere, then clearly < ^- If ?t- tends 

to infinity, then for n larger than a certain bound, Xn.jo > 2~-'o(^'"+-^) . 
Now, it follows from (2.5) that the risk for thresholding the signal (^.) at 
level jo with thresholds A„jo is at least as large as 

(2.6) j^2^~-3o{'im^\) 12 = a22^Jo2'"/2. 

Using the definition of jo, we obtain 

^2 7) 22''"+3^22-jo{2m+l) y C^Uo ~ f) 

~ n 

Combining the relations (2.6) and (2.7) shows that the risk for estimating 
(6.) is as large as 



^2 2 -jo 2m- 1 

^ \ ^-2m/{2m+l)(^j^ _ -|^^|2m,/(2r7^+l) 2-2m.{2m.+3)/{2m.+l) -1 ^ 

Since jo ~ log2 n/(2m + 1), this is worse than the minimax rate for B^^. 



3. Heavy tails and median filtering. To date, asymptotics for wavelet 
thresholding seems to have been restricted to noise with higher moments. 
Next, we want to try to apply wavelet thresholding to noise with heavy 
tails and study the corresponding quadratic risks. By first applying a me- 
dian filter to the data, the absence of finite moments will be overcome. The 
downside of this approach, however, is that it introduces an additional bias. 
Nevertheless, under these conditions wavelet thresholding applied to the fil- 
tered data achieves at least the same minimax rate as in the normal case, 
but the constants are larger. Various types of nonlinear smoothers involving 
medians have proved useful in time series analysis (e.g., see [14, 18, 20]). 
Another, wavelet inspired, approach to denoising heavy tails based on a 
different preprocessing method is also developed in [10]. 

Below, given oi, . . . , a2k+i, let med(ai, . . . , 02^+1) be the real x such that 
#{i-CLi > x} = k + 1 and #{i:ai < x} = A; + 1, with # denoting cardi- 
nality. To simplify notation, we use the abbreviation med(aj, 2A: + 1) for 
med(aj_fc, . . . ,aj_|_fc). If i is smaller than k, then med(aj,2/c + 1) := 
med(ai, . . . , 02/0+1); a similar boundary correction is performed for the largest 
indices. 

Our first lemma makes the advantage of the median filter clear as far 
as the existence of moments is concerned. It shows, for example, that the 
median of thirteen independent Cauchy random variables has moments of 
order 7 — e, e > 0. 
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Lemma 3.1. Let Xi,. . . ,X2k-i, k>l, be independent random variables. 
For any x > 0, 

P(med(Xi,...,X2fc_i)>x)< f^^"^) max > x))^ 

\ K J i=l,...,2fe— 1 

In particular, if there exist constants C > 0,7 > such that, for x large 
enough, maxj=i^...^2fc-i -Pd-'^il > 2;) < then ined{Xi, . . . ,X2k-i) has mo- 
ments of order r < kj. 



Proof. 



{med(Xi,...,X2fc_i)>x}= IJ {Xi>x:ieM}. 

MC{l,.--,2fc-l} 
#M=k 



Hence, 



P(med(Xi,...,X2fc-.i)>x)< f^^, max {P{Xi>x))'' 



V k J ^=C%-^' ^^^^ ^ -^^^ • □ 

Let us now give the main result of this section. As before, the data is given 
via (1.1), while Wn is a discrete wavelet transform as in the previous section 
(in particular, it is generated by a compactly supported wavelet ■0 which is 
Holder continuous of index /3 > 0, chosen later). Again, let 9 = Wn{f) and 
let also med(X, 2Z + 1) := (med(Xj,2Z + l))i<i<n, where (using the notation 
set above) med(Xj, 2/ + 1) := med(Xj_;, . . . , Xi+i), for i — / > 1 and med{Xi, 
21 + 1) := med(Xi, . . . ,-^2/+i) otherwise. 

Theorem 3.2. Let the ei be symmetric with E\ei\'^ < +00, for some 
7 > 0. Let A,B > 0. Then there exist an I = l{'j) and thresholds Xj^k such 
that 



sup i^^|ri;,jW„(med(X,2Z + l))^.,)-e,-,f = 0(n-2-/(2-+i)). 

\W\b^;^<A j^k 
E,\h-f,-l\^<B/n 



We impose the condition J2i \fi ~ /j-iP < B /n to have control over the 
^2-norm of the bias, that is, on 

n 

(3.1) ^(med(X„2/ + 1) - med(e„2Z + 1) - f^f, 

i=l 

which we introduce by median filtering the data. This condition is not that 
strong and in most cases follows from the Besov norm condition. We will 
take another look at this after the proof of the theorem. 
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Theorem 3.2 is more than just an existence result. Indeed, from its proof 
we infer that the above thresholds Aj^^ can asymptotically be chosen as in 
the Gaussian case, but with a new variance which is now at most 2D^a'^^^, 
with cr^ax given below and with D = 21 + 1 (see also Remark 2.6). 

Proof of Theorem 3.2. Since the are symmetric, £'med(ei, 2/ + 
1) = (again I is chosen later). Let yj f^ be the coefficient of index j,k of 
Wn (med(e, 21 + 1)), and let 

{b,,k) := Wn{med{X, 21 + 1)) - Wn{f) - {yj,k)- 

First we prove that the influence of the random variables bj^k (the bias) is 
not too large in our estimation problem: 

EiT^^jWnimed{X, 21 + 1))^. ,) - O^^uf 
= E{Tx^^^ {Oj^k + bj^k + yj,k) - (^j,kf 



2 , nrp/rpS 



< 2Ebik + 2i=;(Ti;,^ (^,- + y,- fc) - e.^k), 
since \T^{xi) — Tj^(a;2)| < \xi — X2\- Thus, 

Y^E{TL{e,,k + b,,k + yj,k)-9j 



k) 



< 2 J2 Eblk + 2 E{Tl^ {e,^k + yj,k) - 9j,kY 

j,k j,k 



Note that 

T^^lk = E(^-(med(X,2/ + 1) - med(e,2/ + 1) - /))^% 



j,k j,k 



= E(med(Xi, 2/ + 1) - med(e„ 2/ + 1) - fif. 
1=1 

But for I <i <n — I, 

I med(Xi, 2/ + 1) - med(ei, 2/ + 1) - /i| 

< I med(ei_z + fi_i - fi,..., Ci+i + fi+i - fi) - med(ei, 2/ + 1) 

< max l/j+j - fi\ 
j=~l,...,l 

I 

- X! \fi+j - fi+j-l\- 
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If i < / or i > n — Z, then, similarly, | med(Xj, 2/ + 1) — med(ej, 2Z + 1) — /j| < 
J2f=i \fj - fj-i\, respectively, < T,]=n-2i+i \fj - fj~i\- Hence, 

n~l I 

E^>< E 2^ E \f^+J-f^+j-l\' 
j,k i=l+l j=-l+l 

21+1 n 
j=2 j=n-2l+l 

n 
1=2 

This implies that, if we choose a fixed median filter, then J2j,k^^'j k ^^S" 
ligible compared to 0{n~'^"^/^'^"^~^^^). Thus, to finish the proof, it suffices to 
show that 

sup Ej2iT^, , (Wnif + e^fc) - e.^kf = 0(n-2-/(2™+i)), 

where := med(ei,-D) and D = 21 + \ is chosen such that i^leil^ < oo and 
L satisfies the moment conditions (which depend on 7) of Theorem 2.1. If 
the ej were independent, which they are not, we could apply Theorem 2.1 
to conclude. The next two lemmas deal with this new situation (the D- 
dependent case) and, respectively, correspond to Lemma 2.3 and to Lemma 
2.5 in the independent case. 



Lemma 3.3. Let he zero mean hounded random variahles, 

with supj ll-'^illoo < K, and also D-dependent, that is, such that Xj^, . . . ,Xi^ 

are independent if mini<j^r<fc \ij — ir\ > D. Let Sj = X]l=o ^^^^^ XiD^j, j = 
1,...,D, a'j=ESj, and Umax = maxj=i^...^£)(7j. Then 



(3.2) PiY,Xi>x]<D 



, j=i 



xpf i/a;<^21S^ 



Proof. Note that the Sj are sums of independent random variables: 

Vi=l / 1=1 

D 

= Y.P{S,/a,>x/{cj,D)) 
1=1 
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xK 



if a; < 



K ' 



<Dl 



exp 






exp^ 






expf 




AKD ) ' 



if x< 
if x> 



K ' 



if X < 



K ' 



K ' 
if X > '^'"^^^ 



where the last inequahty holds since for x < — 



K 

^2 



K ' 4n^cr^ 



< 



□ 



With the help of Lemma 3.3, it is also possible to prove a D-dependent 
version of Lemma 2.4. 

Lemma 3.4. Let (Xj, 

,n)i,nGN ^6 ^CTO mean random variables such that, 
for each fixed n, the Xi^n are D-dependent and such that Mn := E\Xi^riC' < 
+00. Let Sj^n = T,i XiD+j,n, j = 1, • • • , -D, let a-|„ = ESj „^ and let <Tmax,n = 
maxj=i^...^£) (Tj^„. Then for all < e < 1, there exist /?„ = I3n{s) with 
limsup^^^oQ /3n = L) such that 



sup 



p(i:iXi,n<-x) 



and 



sup 



0<x<eD......„V 21og(l/M„) $((a:/(cTmax,nI)), +Oo)) 

Proof. If x > 0, then 



<Pn- 



D 



< ^P(5'j-„/crj-„ > x/(crmax,n^))- 

i=i 

A similar inequality holds for x < 0. Since the Sj^n are sums of independent 
random variables, the assertion follows from Lemma 2.5. □ 
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Note that it is, moreover, trivial that, for x > 0, 



^>((-CX), a;/(cTmax,n-D))) ^((-2;/(fTmax,n-D), +Oo)) 

and this gives a version of the other half of Lemma 2.5 in the D-dependent 
case. 

The rest of the proof of Theorem 3.2 is then quite similar to the proof of 
Theorem 2.1. Let us return to it. Again, let := med(ej,D) be as defined 
above. First, as in the proof of Theorem 2.1, we can assume that the e, are 
bounded by for some 5 > 0, since the upper estimate in Lemma A.l holds 
for D-dependent random variables with a constant depending now also on 
D. 

Of importance in the proof of Theorem 2.1 was the distribution of the 
noise in the wavelet coefficients. We denote the coefficients of the wavelet 
transform by {cj^k.i)-, that is, Oj ^ = J2i(^j,k,ifi- At the boundary we have the 
problem that ei = • • • = e(£i+i)/2 and en-(D-i)/2 = ' " = Sn- But 

/ {D+l)/2 X n-(D-l)/2-l I n \ 

V i=l ) i=I^D+\)l2+\ \ i=n-{D-l)/2 ) 

and this last expression, which is a sum of n — D + 1 random variables which 
are -D-dependent, thus satisfies (after reordering) the conditions of Lemmas 
3.3 and 3.4. Anyway, only about O(logn) wavelet coefficients are affected 
by this problem. If we do not threshold these coefficients, the risk would 
increase at most by O ( (log n)/n) and this is negligible compared to the 
minimax risk. Let yj^k,r = J2i'^j,k,iD+r^iD+r- Then using the D-dependence 
condition, we see that yj,k,r is a sum of independent random variables with, 
moreover, y^- ^ = J2r=i VjAv Let al^^^ = Eyl^^^ and let cT|fc = max^ aj;. 
Then, clearly, (Tjkma.x — and a version of Lemma A.l holds for the 
Uj^k, the upper constants depending now also on D. Given this, as well as 
Lemmas 3.3 and 3.4, we can now proceed as in the proof of Theorem 2.1. 
Hence, with the right thresholds, we can achieve D times the performance 
of the Gaussian risk with variance 2a'^^^D'^, where (T^^ax — ^^^j<t,k ^^j^k, ma.xJ 
t being the finest level where the wavelet coefficient is not discarded. 

Also of importance in Lemmas 3.3 and 3.4 is the term cJmaxi show next 
that in general, 

(3-3) a]^k,m!,x^ Eej/D. 

Let h = log2ra. Since the wavelet ip is compactly supported and Holder con- 
tinuous of index /3, we know that (see, e.g., the proof of Theorem 4.1 in 
[1]) 

\2'-^~^^/^Cj^k,i - i^{2^~^i - k)\ < Ci2(^'-'^)^, 
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with also 

|V^(2^'-^i -k)- il,{2^~^{i -l)-k)\< C22(J'-'^)^, 
for some constants Ci,C2. Thus, 

and 

I 2 _ 2 I _ I _ II I I 

since the |cj^fc^j| are of order 0{2^^~^^/'^), C3 being a constant. But a 



j,k,r 



^HJ2iCj^k^iU+r ^^'^ #{^:cj_fc^j / 0} = 0(2'* (again, # denotes cardinal- 
ity) since the wavelet is compactly supported. Indeed, recall that (see, e.g., 
[3]) 



(3.4) 

and 

(3.5) 



2J0-3 (N-l) 

= X! '^jo-j,i+2n-ik't'jo,i 
i=0 



'j,k 



2J0-J(Ar-l) 

E - 

i=0 



jo-j,i+2n-jk'rjQ,ii 



where u.^. and v.^. depend only on the scaling identities (whose size we 
set equal to A''). This claim about the length of the filters (uj^^-j) and 
{ujg-j) can be proved via a simple induction argument. Actually (see [1]), 



: 0(2(jo-J)/2) g^^^ 

max,- 



Vi, 



:O(2(j0"J)/2). Thus, 



I 2 _ 2 I 
\^j,k,r ^j,k,r+l\ 



i 

= 0{2^^~''^^\Eel\). 

Since J2r kr~ J^i'^'j ki~ ' ^] kr ^^^^ about the same size 

and, thus, a^f^ ^^^^ ^ Ee\/D. This completes the proof of Theorem 3.2. □ 

We now turn to the problem of finding out when the condition 

n-l 

(3.6) ^|/,-/,H.i|2 = 0(l/n) 

i=l 

follows from H^Hb™ < If ?n < 1, then assume (3 >m, where /3 < 1 is the 
Holder continuity exponent of the wavelet (otherwise the characterization of 
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smoothness via wavelets does not make sense). Again, h = log2n. Since for 
a constant Ci > 0, 

and #{i : Cj^k,i 7^ 0} = 0(2^-^), it follows that 

n-l 

1=1 

where C2 is another constant. Note that since the wavelet transform is an 
orthonormal transformation, fi = J2j k'^j,kCj,k,i- Thus, 

n-l n-l / \ 2 

i=l i=l \j,k / 

n-l h-1 / \ 2 

^ I] ^ H H Hkicj,k,i - Ci,fc,i+l) 
1=1 j=0 \ k / 

n-l h-1 

i=l j=0 k 

since #{/c : Cj^fc,j / or Cj^k,i+i 7^ 0} = 0(1), see (3.4) and (3.5), and with C3 
a constant 

h-i 

= hCs ^^ajf^^{cj^k,i - Cj^k,i+if' 

j=0 k i 

j=0 k 

where the last inequality is proved by using arguments as in the proof of 
Lemma 2.2. Thus, if /3 = 1 and m > 1, respectively, s > 1, then the last term 
is equal to 0(/in~^). If m < /3, respectively, s < (3, then the last term is equal 
to 0{hn~'^"^), respectively, 0{hn~'^^). Hence, for p > 2 we obtain 



(3.7) I/* - = 0(logn/n(2-A2))^ 

4 = 1 

and for p < 2 we obtain 

n-l 

(3.8) ^|/,-/,+i|2 = 0(logn/n(2^^2))^ 



i=l 
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Thus, for p > 2 the condition m > 1/2 will ensure that (3.6) holds. For p<2 
the additional condition m> 1/p ensures that 2s > 1 and, thus, (3.6) is 
always satisfied. 

Remark 3.5. Above, and also in view of the proof of Theorem 2.1, the 
i.i.d. assumption on e can be weakened and replaced by independence with 
sup.iE\ei\"' < +00, for some 7 > 0. The previous proofs also show how to 
deal with noise (with or without higher moments) that is not independent, 
but D-dependent, where D is a fixed constant. Indeed, Lemmas 3.3 and 
3.4 are applicable, and then it is easy to mimic the proof of Theorem 3.2 
and the minimax rate for this problem is again as in the Gaussian case. 
To obtain such a result, the noisy wavelet coefficients need not converge in 
distribution to a normal random variable. Only the bounds of Lemma 3.4 
and of Lemma 3.3 are needed. This approach via large deviation results is 
also possible for other kinds of correlated noise. Under appropriate weak 
dependence conditions, the law of the noisy wavelet coefficients is asymp- 
totically normal with a variance possibly bigger than the variance of the 
original noise. Wavelet thresholding has also been investigated for station- 
ary Gaussian noise; for example, see [12, 21]. Let us finally mention that it 
would be interesting to transfer the "ideal framework with quadratic risk" 
to heavy tail noise via median filtering. 

Remark 3.6. The upper bounds obtained in Theorems 2.1 and 3.2 can 
often be complemented with lower bounds of the same order for various types 
of noises. In turn, these bounds often represent the order of the minimax rate 
among all estimators (see the various references cited in the introductory 
section). However, different nonlinear estimators can outperform wavelet 
thresholding for still other types of noise. Let us briefiy present such an 
estimator. The model is the usual one, Xi = ft + a/^/n, i = 1, . . . ,n = 2^ , 
where the e, are zero mean i.i.d. random variables with finite second moment. 
Our estimator of fi based on the Xi is 



(3.9 f,- := max A,-_i_,- , z = 1, . . . , n — M -h 1, 

^ ^ j=0,...,A/-l ' ' ' 



where M := M(n) will be chosen later. Let cm '■= -E'maxj=i^...^7v/ Ci. Thus, for 
i<n- M + 1, 




and for i > n — M + 1 



(3.10) 



fi — fn~M+l 
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Hence, 

\fi — fi\ ^ niax I fi+i — fi\ -\ 1= 

■'^^ - j=o,...,M-i ^ y/n 

and 

/M-l \ 2 



max Cj+o — Cm 
i=o,...,M-i 



E\fi - Zip < ^ \ fi+j - fi+j-i\j + ^^(^_^.^max_^ei+j - cm 

M-i 2 / 

< 2M V \ fi+j - + -E( max Cj+j - cm 

~^ re Vi=Ov,A^-i 

A similar computation for i> n — M + \ gives 

" 2 / ^2 

Hence, 

n n — 1 y 

(3.11) S^|/,-/i|2<4M2^|/i+i-/i|2+4var( .jnax^ei 

From (3.7) and (3.8) and taking (3=1, we know that J2i\fi+i ~ is 
either of order 0(n~(^'"^^) logn) or ©(n"^^'*^^) logn), according to p. Thus, 
var(maxj=i^,,,^Af ej) and an optimal choice of M control the right-hand side 
of (3.11). 

If the Ci are i.i.d. standard normal random variables, then var(maxj=i^...^jv/ ) 
is of order 1/2 log M. Hence, and say, for p > 2, minimizing in M (M = 
n"*'^^/(logn)^) gives a rate of order 0(1/ log n), coming short of the thresh- 
olding rate. 

Now, using arguments similar to the ones in the proof of [1], Theorem 4.1, 
it is easy to show that, for i.i.d. (symmetric) bounded noise, soft thresholding 
has the same minimax rate as it would have for Gaussian noise with the 
same variance. This can come short of the rate achieved by the estimator 
presented above. Indeed, if e\ is a symmetric Bernoulli random variable with 
law ((5-1 (5i)/2, then var(maxj=i„„,M e^) = 1/2^"^ - 1/22^^"^. Hence, for 
P>2, 

n 

(3.12) sup ii;^|/i-/,|2<C(M2n-(2-^2)i^g^^2-*^^, 

lie|lsm_^<A 

where C is a constant. The right-hand side in (3.12) is now minimized by 
choosing M = 21og2 n and, thus, 

(3-13) sup E±\f.-M^ = 0(^^'^ 
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For p <2, the right-hand side of (3.13) should be replaced by 0( \°£a^ )■ In 
both cases the rate is better than 0(n~^"^/(^''""'~^)), which is the minimax 
rate for soft thresholding in the Gaussian model. 

For another example, let the be uniformly distributed on [—1, 1]. Then 
varmaxj=i,...,Mej = 4M/(M + 1)2(M + 2) is of order 0{l/M^); hence, for 
P>2, 

n 

(3.14) sup EY,\fi- fi\'^ <CiM^n-^^"'''^hogn + M-^) 

[resp. < C(M^n~(^'^^^) logn + M~^) for p <2], again C is a constant. For 
p>2, taking M = n'^"^^^'^/'^ / ^/logn [resp. M = n'^''^'^^/^ / ^/logn for p < 2] 
gives 

P-i^i) -p^/tl/<-/.l^ = o(^) 

\mB^g<A i=i \n J 

[resp. 0( ^1^" )]- These rates are better [only for p > 1/ {m + l/2 — 2m/ {2m + 
1)), when s A 1 = s] than 0(n-2"V{2m+i))^ jj^ ^jg^ j^gj q£ r^^g. 
orem 5.1, the smoothness of the density of the compactly supported noise 
might help thresholding reach the minimax rate among all estimators. 

4. Concluding remarks on block thresholding and kernel estimators. Block 
thresholding, which applies thresholding to a whole block of wavelet coef- 
ficients, has been developed by Cai [2] as well as Hall, Kerkyacharian and 
Picard [11], to deal with signals exhibiting a correlation in the size of their 
wavelet coefficients which are above each other. More precisely, a block of 
noisy wavelet coefficients 6i + zi, . . . ,6k + Zk, is kept if X](^i + -^i)^ is larger 
than a threshold, otherwise the whole block is set to zero (one could also 
keep a block if one of the coefficients in it is larger than a threshold). As de- 
fined, block thresholding shares the minimax properties of soft thresholding, 
in both the ideal and functional frameworks. 

In block thresholding the blocks are horizontal, that is, made up of the 
coefficients with indices {j,k), . . . , {j,k + K). Below, we briefly present a 
vertical block thresholing methodology (the blocks are vertical and not 
disjoint) which also shares the same minimax properties as the horizon- 
tal block thresholding estimator. More importantly, we show that (for the 
Haar wavelet) this thresholding estimator is nothing but a kernel estimator 
with locally varying bandwidth. This is another instance of the well-known 
fact that thresholding rules represent a method of adaptive local selection 
of bandwidth (see [7]). 

First we introduce some terminology. We say that an index {j',k') (or 
the wavelet coefficient with this index) is above the index (j, k) if f < j 
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and I [2J '^k] -k'\< J, where J G N is a positive constant. In vertical block 
thresholding, if \Oj^k + Zj.k\ is larger than a threshold, then the coefficient 
itself is kept and, moreover, all the coefficients above it are also kept. A 
variation of this method is to keep the coefficients with the indices {j,k'), 
\k — k'\ < J (for some other constant J), as well as all the coefficients above 
them. 

This new method achieves (as quickly shown below) the optimal mini- 
max rate in the ideal estimator context (a similar result holds for the func- 
tion space approach too, but the proof is left out). In our usual model, let 
the noise be i.i.d. standard normal random variables and let A„ be such 
^I>;^^_1}(1 + z|^) = 1/n. With the background and methods of the 
present paper and its companion [1], it is easy to see that A„ ~ \/2\ogn. Let 
J be as above. Note that the number of coefficients above a coefficient is 
less than (2 J + 1) log2 n, since in each level there are only 2 J + 1 coefficients 
above a fixed coefficient. Next, set Y = Wn{X) and define the estimator 9j^k 
for the coefficient 6j,k by 

(Yj^k, if|^j,fc|>A„ 
9j^k ■= I or 3 (j', k'), {j, k) is above (/, k') and | Yj',fc'| > A„, 

I 0, elsewhere. 



We then have 



{\Yj,k\>X„}Zj,k + l{|y,,fc|<A„}^ifc 



(4-1) + l{(i,fc) above a \Yj, ,^,\>X„} ^j,k) 



{j',k') above {j,k) 



If l^j.fel < 1; then 



^l{|e,,fc+2,,fc|<A„}^j- fc < 

and 

^'^{\Yj.k\>Xn} J2 4,k' < (2^ + 1) log2 "^l{k,,fe|>A„-l}^4fc 

{j',k') above {j,k) 
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^ (2J + l)log2n 



n 

If \9j± + Zj^k\ < An, then \9j^k\ < |A„| + \ zj±\; thus 



Moreover, 



and 



(j',k') above {j,k) 



Hence 

-^(^{|y,,fc|>An}4,fc + l{|yj,fc|<Au}^|,fc + l{|y3,fc|>A„} Eq'.^') above (j,fc) Zj',k') 

(4.2) l/n + min(02^,l) 

< Clogn, 

for some constant C. Combining (4.1) and (4.2), we obtain 

1 + Ei,femm(0^^;,,l) 

proving our claim on the minimaxity of the method. 

Another interest of the vertical block thresholding method is the fact 
that it is close to a kernel estimate with locally varying bandwidth (this is 
precisely proved below in the case of the Haar wavelet). Indeed, a simple 
first-order approximation of the noisy wavelet coefficients is given by (since 
2^ is small compared to n) 



where, as usual, ^l^j^k and <j)j^k are, respectively, translations and dilations of 
the wavelet tp and of the scaling function (p. 

If we estimate fi by discarding the levels below the level jo, then by a 
first-order approximation, as above. 



3>3(),k 



^j,k{^/n) v\ilJj,k{i/n) 



3>30,k\l 



20 R. AVERKAMP AND C. HOUDRE 

Wavelet thresholding 
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Kernel estimate with varying bandwidtli 




Fig. 1. Hard thresholding, vertical block thresholding and kernel estimate. 
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= -Y,K{l/n,i/n)Xu 

where K{x,y) = J2j>jo,k'4'j,kix)'^j,k{y)- If we also keep the level jo + 1, then 
K{x,y) has to be replaced by K{2x,2y)/2. Thus, the parameter 2"-''' cor- 
responds to the bandwidth of a classical linear kernel estimator (see also 
[7])._ 

Figure 1 shows for an artificial signal which wavelet coefficients are kept 
with different methods [the artificial signal is just a random signal; the co- 
efficient with level j,k is a random variable with distribution A^(0, 2""-')]. 
(Nothing else but these coefficients is present in the signal.) The dark rect- 
angles correspond to coefficients which are kept. The top picture shows the 
coefficients kept for a hard thresholding estimator, while the bottom one 
shows which coefficients are kept for a kernel estimator. The middle picture 
illustrates why the vertical block thresholding can be viewed as a kernel 
estimator with locally varying bandwidth (we keep some neighboring coef- 
ficients as well). 

This analogy between the vertical block thresholding estimator and kernel 
estimators with locally varying bandwidth becomes even more transparent 
by choosing the underlying wavelet basis to be the Haar basis. Then for 
vertical block thresholding, each estimate fi is the mean of some neighboring 
Xj. For the Haar wavelet, the scaling identities have the forms 

(j)j,k = -^{4'j+i,2k + 4>j+i,2k+i) and tpj^k = -^{4>j+i,2k - (/>j+i,2fc+i)- 

With this in mind, and for an input signal Xq, . . . , n = 2^ , the discrete 

wavelet transform is given by 

CO = ^X!^« ™^ ^hk'=~f^^ H f2h-Jk+i- f2h'3k+i- 

The inverse transformation is then given by 

„ 1 Jl, if i/2'^-J - [i/2'*--'] < 1/2, 

~ V^'^" + A^S,[i/2'-^] \ if ^i2h-3 _ ^ii2h-3] > 1/2. 

To compute an estimate of /j, and discarding the levels below jo, we have 

^ ^. , / 1' if V2"^^' - < 1/2, 



1 f 

fi = -^CO + J2 <^hk 

,=0 ^ 



\ii/2^^i - [i/2^~^] > 1/2, 
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^ [i/2''"^0-i]2''"J'o-i+2'^"JO-i-l 



i=[j/2''-J0-i]2''-J0-l 

where the last equahty follows from a simple induction argument on Jq. 
Thus, if we discard the levels below Jq, then fi is the mean of a block of 

We claim now that, if in vertical block thresholding the coefficient with 
index (ji, [i/2^~-^^]) is kept because the coefficient with index {j,k) is larger 
than the threshold and \{i/2''~^^] - [k/2^-^^]\ < J, then for all j2 < ji, 
I [i/2'^-i2] _ [k/2^-i^]\ < J, that is, the coefficients with indices (j2, [i/2''-^^]), 
J2 < jii are also kept. 

Since for x S R and A; G N, [x/k] = [[x]/k], it is clear that 

|[i/2'^--''i]/2^'i^^2 - [k/2^~^']/2^'~^^\ < J/2J'i-J2, 

hence 

J>\[[i/2^-^']/2^'-^^] - [[k/2^-^']/2^^-^^]\ 
= \{i/2'^-^-]-[k/2^~^^]\. 
Thus, for vertical block thresholding, we also obtain 

^ [•j/2'^~J0-i]2'""^0-i+2''~«)-i-l 
i=[i/2''-Jo-i]2''-30-i 

but where now jq depends on i and {Xi), that is, it is a kernel estimator 
with locally varying bandwidth. 

Lepski, Mammen and Spokoiny [13] have already presented a kernel es- 
timator with locally varying bandwidth which achieves the same minimax 
rate as a wavelet thresholding estimator. There the local bandwidth is cho- 
sen from a set a~^hi, a,hi > constants and j = 0,1, . . . . For the simple 
kernel estimator based on wavelets, the bandwidth is 2~^ , j = 0,1, . . . , and 
j is the last level of wavelet coefficients that is kept. The results of Lep- 
ski, Mammen and Spokoiny [13] show that kernel estimates with a locally 
varying bandwidth selection can be as good as wavelet thresholding in a 
minimax sense. The performance of vertical block thresholding also makes 
this plausible. 

APPENDIX 

Let us start with a simple lemma important in transferring part of the 
proof to a truncated noise setting. 

Lemma A.l. Let Xi, i = 1, . . . ,n, he independent random variables such 
that EXi = 0, EXf = 1, and := EXf < +oo. Let Y := X;r=i where 
J27=i = 1 ■ Then min(3, < EY^ < max(3, 7714) . 
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Proof. 



= 7714^24 + 3 Y,a^i Oj 

i i 

= (m4 — 3) + 3 

i 

= m4 + (m4 — 3) af — 1^. 

The assertion now follows from J2i o-f D 

Proof of Theorem 2.1. Let us describe the general strategy of proof. 
The risk given by the denominator of (2.3) is split into two sums going 
from coarser to finer noisy wavelet coefficients. First the coefficients from 
a certain index upward, that is, the finer wavelet coefficients, are discarded 
because their £^-norm is asymptotically negligible compared to the minimax 
risk. Next, in view of the moment conditions imposed on e and from a 
proper choice of the thresholds, we can reduce the proof to a truncated 
noise case. The rest of the proof then deals with the core of the estimation 
problem which corresponds to the sum containing the coarser coefficients and 
truncated noise. There, the right thresholds can achieve the same minimax 
performance as in the Gaussian case. 

Choose a, e > such that 

ri/(2m + l), ifp>2, 6 
"^lm/(2,+l)s, ifl<p<2, ^""'^ ^^(i_«)_2e- 

This is certainly possible given the conditions of Theorem 2.1. Then let 
/ = l{a,n) be such that 2' < < 2'+^ In view of Lemma 2.2, it follows that 

(A.l) sup 5:0|, = o(n-2-/(2-+i)). 

But [see (1.5) and the references given there] the numerator in (2.3) is 
~ Cn~'^"^^^'^^~^^\ Thus, choosing Xj^k = oo for j > I, it suffices to show that 

inf(A)6K- supe: weWgm <AE<s>J2j,kiTx^^^i'Wj,k) - 0j,kf 
liminf — — ^ — ; — ■ > 1. 
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Let An := {maxi|ei| < c„}, where c„ = n-^2(''"')/2 < 2„-£+(i-a)/2^ 
let also e-i := eilA„- Note that cr^, the variances of the ej, are smaller than 1, 
but converge to 1 (if the are not identically distributed, the convergence 
to 1 will hold uniformly). Finally, let Zj^k '■= (^(e/-v/ra ))j,fc- 

On An, Si = ei, hence, denoting by Tj^k the soft-thresholding operators 
with thresholds Xj ^ smaller than logn/^/n, for j <l, we have 

E ^ \Tj^k{(^j^k + Zj,k) — ^i,fcP 

\j<l,k 

+ X! \'^jA^i>k + Zj^k) - (^jM^^A-^ I 
j<l,k ) 

(A.2) =EY. + - 

< £" ^ \Tj^k{Oj,k + Zj^k) — Oj^kl'^lAn 

j<l,k 

- E \'^j:k{Oj,k + Zj^k) — Oj,kf 

+ 2n''^^{M + di)^P{A'n), 

where dn = 0(logn/\/n), using the elementary inequality (a + 6)^ < 8(a^ + b^), 
and using Lemma A.l [Ezjj^ < M := max(3, -Eef )/n^, since Eci = and 

^ef = l]. 

We will now show that the rightmost term in (A.2) is of order o(l/n), 
which is again asymptotically negligible compared to the minimax risk in 
(2.3). Indeed, the Cj have moments of order L; hence (using the i.i.d. as- 
sumption) 

P{A^n) = max |ei2-(''^')/2| > ^~e\ < nP{\ei\2^'-''~^^/^ > n^') 

\l<i<n / 

<i?|ei|V-^«i-)/2-^). 
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This imphes that P{A^) = ©(l/n^) if 1 - L((l -a)/2-e)< -2, that is, if 
(A.3) L> ^ 



(1 - a) -2e' 

and this proves our claim on the size of the rightmost term in (A. 2). Thus, 
we wiU be done if we prove that 

inf(A)gK- SUpe: llell^™ <A^-f J2j,kiT^ ^{Wj,k) - Oj^kf 

(A.4) hminf ^ > 1. 

i|A||oo<logn/v^ 

Note that in (A.4) the symmetry assumption on e ensures that Eei = 
for all i and, thus, Ezj^k = for all j, k. 

Consider the coefficients in the levels I and above with 2' < n° < 2'+^. 
Let z be the noise part in one of these coefficients. Then (see the proof of 
Theorem 4.1 in [1]) with n = 2^, 

n 

(A.5) z = Yvi-^ and maxhJ < Ci2"(^~')/2 < CiVn°-i, 

where C\ depends only on the type of the wavelet transform used. Note also 
that, by the scaling identities (e.g., see [3]), 



(A.6) #{vi ■.Vii^Q}< Can^^" and, thus, ^ \vif < CiCzVn""!, 

i 

where again C2 depends only on the wavelet transform. 

Since ||ej2"^^^'^/^||oo < n^^, i = 1, . . . ,n, the noise terms in the wavelet 
coefficients are sums of independent random variables which are smaller 
than / y/n, which in view of (A.5) and of (A.6) satisfy the conditions of 
Lemmas 2.4 and 2.5. 

In the sequel, for a law /x, and for A > and G M, we set p^{\,6) := 
J-^C^xi^ + ^) — 9)"^ fj,{dx) . Let now flj^k denote the law of the random 
variable Zj^^, that is, the distribution of the noise in the coefficient of index 
{j,k). [Recall that if the ei/y/n are i.i.d. A^(0,l/n) random variables, then 
the distribution of the noise in each coefficient is := A^(0, 1/n). Recall 
also that Ezji^ = 1/n and, thus, Ezji^ < 1/re, and that, finally, I, jlj^k and 
Xj^k depend on n, but that for simplicity we choose not to indicate this in 
the notation.] 

Let An be the threshold such that p$„(A„,0) = and let 6 >0. If 

A > An, then 

{Tl ix + 9)- ef < (Tf (x + 9)- 9f for x G (-A, - 9, A„). 
Moreover, /^(x - Xnf^nidx) =p<i,„(An, 0)/2 and 

^" \{x + Xn + 9)-9f^n{dx)< [ ^\x + Xnf^n{dx)=p^„{Xn,0)/2. 
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Hence 



/An fOO 
(Tf (x + 9)- ef'^nidx) + / {Tlix + 9)- ef'^nidx) 

J —oo 

From the above inequality (and a similar one for 6 <0), it thus follows that 

inf(A)eK" supg. „ <AY.j.kP^{'^j.k-,(^j,k) 
(A.7) liminf— ^l^— , >l, 

n->oo mf (;,)gj,„ SUpg.||0|| <^X;,fcP<I'(Aj,fc,6'j,fc) 

since 

y J_<i = o(n-2"V{2m+l))^ 

and again, j^-2m/(2m+i) -g ^j^g minimax rate in the Gaussian case. This shows 
that, without loss of generality and in the Gaussian case, we can assume that 

supjx^ fcAj^fc < A„ ~ ■ If Zj^f: had variance 1/n, it would be enough 

in order to complete the proof of the theorem (since also supj<; ^ < 
logn/ y/n) to show that 

(A.8) liminf inf inf sup inf^^^i%^>l. 

However, (and so \fnzj^k) has variance o"^, which is smaller than 1 (but 
converges to 1) and so a further little adjustment is needed. Let /Xj^fc be /ij^fc 
rescaled to have variance 1/n. A simple differentiation under the integral 
shows that 

^^'/i,,fc(^'^)• 
Hence, taking the sup over a larger set, in place of (A.8), it is enough to 
prove 

(A.9) liminf inf inf sup inf ^"^"^^J^^ > l. 

Note. Prom now on, we set := iij^k and also set p„ := p^^, . Moreover, 
since performing computations with the factor 1/n is cumbersome, we will 
multiply the random variables and thresholds by ^/n, and the risks by n. 
The size of the fraction in (A.9) is unchanged by this transformation. 
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Next, we need two simple inequalities. First, 
p^{X,e)=9^^{-\-9<x<X-e) 

p+oo r+oo 

+ / (x- A)2$(dx) + / {x- Xf<i>{dx) 
Jx-e Jx+e 

(A.IO) 

>9^^{-x-e<x<x-e) 

p^{X,0)+p^{X + 6sgn{6),0) 
2 

For the second, let A > 1. Then 

p$(A,0) > 2^{x > A + 1) 



where the second inequality follows from a classic lower estimate on the 
standard normal distribution function (see [19], page 850). 

Let us now proceed to prove (A. 9). By Lemma 2.5, there exists a sequence 
(Pn) converging to 1 and ei > [in view of (A. 6) one can choose ei = (1 — 
a)/2], independent of the index of the wavelet coefficient such that, for 
Un ■= V^i^logn, and all c such that |c| < an, 

(A.12) P„<^P^ a,.d ,3„<*«-°°-'=» 



We distinguish two cases to prove (A. 9), A < Qn/2 and A > an/2. Assume 
first that A < an/2, and choose A = A. For fixed A, let rg{x) := {T^{x + 6) — 
9)'^. To spare us some further distinction of cases, assume that 6 > (the 
case 9 <0 leads below to similar results). Then rg is a function with one 
local minimum with value at x = A, moreover, if ^ = 0, then the minimum 
is attained at [—A, A]. Hence, rg(x) > for x> X and rg(rc) < for x < A. 
Thus, from 

+00 rX r+oo 

rg{x)d^{x) = / {-rg{x))^{x) dx + / rg{x){l - ^{x)) dx, 

-oo J~oo JX 

and [integrating by parts with also ^^(A) = 0] 

/•A 

re{x) dfin{x) < / (-rg(x))/i„((-oo, x]) dx 

J -On 

rctn 

+ / rg(x)/i„([x,oo))dx. 
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and inequality (A. 12), it easily follows that 

I-^re{x)d<!>{x) 

Moreover, by Lemma 2.4 (with K„ = n^^ and a„ = logn), 
reix) dfin{x) < {X + \x\f dfin{x) 

{\x\>a„} J{\x\>a„} 

< / Ax'^ dfj,n{^) 
J{\x\>an} 

< 4{{al + 2)/cn exp{-alkn/2) + o(exp(-n"))) 
= o(p$(A,0)), 

where kn = 1 — n~^\ogn/2 and where the last identity is obtained using 
(A. 11) and A < a„/2. Since p#(A, 0) < p$(A, 6*), (A. 9) holds for A < q„/2. 
Now, in the second case, A > an/2, choose the smallest A such that 

(A.14) p$(A + 1,0) >p„(A,0) and A > A. 

It is a simple consequence of Lemma 2.4 and of the relation (A. 11) that 
A/A ^ 1 uniformly for A > a„/2 (recall that we assumed A < A„ ~ y/2 logn). 
Again we distinguish two cases. 

First, let 16*1 < 1. Since p„(A,6l) < 6*2 +p„(A,0) and from (A. 10), it follows 
that 

inf > inf ^''^((-A,A)) + (p.i.(A + 1,0) +p.,(A,0))/2 

>$((-Q„/2,a„/2))^l, 

using (A.14) and A > an/2. 

The case \9\ > 1 is more complicated. Assume that ^ > 1 (^ < —1 is 
treated in a similar fashion). Then, since p^{X,9) > p^{X, 1), it follows that 
p$(A, 6) > 1/2 and, thus, p^{X, 9) > 1/2, if a„ is sufficiently large. Moreover, 

/ {Tl{x + e)-e)\n{dx)< j Aix" + X^)yin{dx) 

J{\x\>an} J{\x\>an} 

= o{l), 

since A ~ A < A„ \/2\ogn. As in obtaining (A. 13), it is easy to see that 
. ^ ■ ^ J^^{T^{x + 9)-emdx) ^ 

X>l/2l>ljy^(T^{x + e)-9)^^inidx) ~" ' 
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thus 

liminf inf inf ' — ^ > 1. 

To finish the proof, and using A satisfying (A. 14), we show that 
liminf inf inf — ^ 

n^oo A>Q„/2e>lp^(A,6') 

,,,,, 1- ■ f ■ f ■ J^^{T^{x + e)-6)^Hdx) 
(A. 15) =limint mf mf — n— 

^ ^ '^-^ A>a„/2 9>l/+^(r£(x + 6')-0)2$(dx) 

> 1- 

First, recall that 

((x + X)^, [fx<-X-9, 
{T^{x + d)-ef = I 9^, if -X-9<x<X-9, 

[{x-Xf, ifx>X-9, 

and thus, if x < A - 6*, then (Tf (x + 9)- 9f > {T-^{x + 9)- 9f. Hence, for 
A> A/2, 

e>ix<x/2 (Tf (x + 9)- 9Y - e>i ^<a/2 (x - A)2 

_ (A/2-A)2 
(A/2-A)2' 

which converges to 1 since A/ A ^ 1 . Finally, since 

-OO /■+00 



/ T-Mx + 9)-6Y^{dx)< / (x-A)2$((ix) 

JA/2 JA/2 

r+oo 

< / x2«>(dx) 

JA/2 

r+oo 

< / x2$((ix) = o(l) 

"'an/4 



and since p^{X,9) > 1/2, the relation (A. 15) holds. □ 
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